50 research outputs found

    Improved treebank querying: a facelift for GrETEL

    Get PDF
    We describe the improvements to the interface of GrETEL, an online tool for querying treebanks. We demonstrate how we employed the results of two usability tests and individual user feedback in order to create a more user-friendly interface which meets the users’ needs

    Querying large treebanks : benchmarking GrETEL indexing

    Get PDF
    The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank

    Treebank querying with GrETEL 3 : bigger, faster, stronger

    Get PDF
    We describe the new version of GrETEL (http://gretel.ccl.kuleuven.be/gretel3), an online tool which allows users to query treebanks by means of a natural language example (example-based search) or via a formal query (XPath search). The new release comprises an update to the interface and considerable improvements in the back-end search mechanism. The update of the front-end is based on user suggestions. In addition to an overall design update, major changes include a more intuitive query builder in the example-based search mode and a visualizer for syntax trees that is compatible with all modern browsers. Moreover, the results are presented to the user as soon as they are found, so users can browse the matching sentences before the treebank search is completed. We will demonstrate that those changes considerably improve the query procedure. The update of the back-end mainly includes optimizing the search algorithm for querying the (very) large SoNaR treebank. Querying this 500-million word treebank was already made possible in the previous version of GrETEL, but due to the complex search mechanism this often resulted in long query times or even a timeout before the search completed. The improved version of the search algorithm results in faster query times and more accurate search results, which greatly enhances the usability of the SoNaR treebank for linguistic research

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Complement Raising and Cluster Formation in Dutch. A Treebank-supported Investigation.

    No full text
    Dutch is well-known for its verb clusters, i.e. constructions in which multiple verbs group together. This dissertation presents the most influential analyses of verb clusters in descriptive and generative syntax. It discusses phenomena that are typically related to cluster formation, such as the occurrence of an infinitive where one expects a past participle (i.e. Infinitivus Pro Participio or the IPP effect), word order variation, and the interruption of clusters by non-verbal material. Furthermore, this dissertation investigates how a corpus-based study can shed new light on the current syntactic theories with respect to cluster formation. For the corpus study, syntactically annotated corpora or treebanks are used, since they allow for the empirical investigation of Dutch syntax beyond the lexical level. The observations from the treebanks with regard to the set of clustering verbs, the word order variation in verb clusters, and the instances of cluster interruption are compared to the literature. Special attention goes out to constructions containing te-infinitives, as it is not always trivial to decide whether they are part of the verb cluster or not. Based on the results of the corpus study, a novel analysis of verb clusters is proposed in the framework of Head-driven Phrase Structure Grammar (HPSG). It is demonstrated that this analysis deals more adequately with verb clusters than previous HPSG approaches. An important consequence of the new analysis is that it not only deals with genuine verb clusters, but also accounts for ambiguous constructions. In addition, it extends to the analysis of other phenomena, such as adposition stranding.LOT dissertation series 413nrpages: 300status: publishe

    The IPP effect in Afrikaans: a corpus analysis

    No full text
    Compared to well-resourced languages such as English and Dutch, NLP tools for linguistic analysis in Afrikaans are still not abundant. In order to facilitate corpus-based linguistic research for Afrikaans, we are creating a treebank based on the Taalkommissie corpus. We adapted a tokenizer and a shallow parser, while using a TnT tagger to do part-of-speech annotation. A first linguistic phenomenon we are investigating is the occurrence of infinitivus pro participio (IPP) in Afrikaans. IPP refers to constructions with a perfect auxiliary, in which an infinitive appears instead of the expected past participle. The phenomenon has been studied extensively in Dutch and German, but studies on Afrikaans IPP triggers are sparse. In contrast to the former two languages, it is often mentioned in the literature that in Afrikaans, IPP occurs optionally. We want to check this statement doing a corpus analysis.status: publishe

    Moet dit nie breek nie! Indringers in Afrikaanse werkwoordsgroepen

    No full text
    status: publishe
    corecore